Overview

Dataset statistics

Number of variables13
Number of observations6255
Missing cells21762
Missing cells (%)26.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory592.6 KiB
Average record size in memory97.0 B

Variable types

Numeric11
Categorical1
Boolean1

Alerts

Date has a high cardinality: 139 distinct values High cardinality
id is highly correlated with StoreHigh correlation
Store is highly correlated with idHigh correlation
Promotion1 is highly correlated with Promotion4 and 2 other fieldsHigh correlation
Promotion4 is highly correlated with Promotion1High correlation
Promotion5 is highly correlated with Promotion1 and 1 other fieldsHigh correlation
Weekly_Sales is highly correlated with Promotion1 and 1 other fieldsHigh correlation
id is highly correlated with StoreHigh correlation
Store is highly correlated with idHigh correlation
Promotion1 is highly correlated with Promotion4High correlation
Promotion4 is highly correlated with Promotion1High correlation
id is highly correlated with StoreHigh correlation
Store is highly correlated with idHigh correlation
Promotion1 is highly correlated with Promotion4High correlation
Promotion4 is highly correlated with Promotion1High correlation
id is highly correlated with Store and 2 other fieldsHigh correlation
Store is highly correlated with id and 2 other fieldsHigh correlation
Fuel_Price is highly correlated with UnemploymentHigh correlation
Promotion1 is highly correlated with Promotion4High correlation
Promotion4 is highly correlated with Promotion1High correlation
Unemployment is highly correlated with id and 2 other fieldsHigh correlation
Weekly_Sales is highly correlated with id and 1 other fieldsHigh correlation
Promotion1 has 4153 (66.4%) missing values Missing
Promotion2 has 4663 (74.5%) missing values Missing
Promotion3 has 4370 (69.9%) missing values Missing
Promotion4 has 4436 (70.9%) missing values Missing
Promotion5 has 4140 (66.2%) missing values Missing
id is uniformly distributed Uniform
Date is uniformly distributed Uniform
id has unique values Unique
Weekly_Sales has unique values Unique

Reproduction

Analysis started2022-07-20 02:39:39.731113
Analysis finished2022-07-20 02:40:03.825868
Duration24.09 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

id
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIFORM
UNIQUE

Distinct6255
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3128
Minimum1
Maximum6255
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size49.0 KiB

Quantile statistics

Minimum1
5-th percentile313.7
Q11564.5
median3128
Q34691.5
95-th percentile5942.3
Maximum6255
Range6254
Interquartile range (IQR)3127

Descriptive statistics

Standard deviation1805.807299
Coefficient of variation (CV)0.5773041236
Kurtosis-1.2
Mean3128
Median Absolute Deviation (MAD)1564
Skewness0
Sum19565640
Variance3260940
MonotonicityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11
 
< 0.1%
41551
 
< 0.1%
41771
 
< 0.1%
41761
 
< 0.1%
41751
 
< 0.1%
41741
 
< 0.1%
41731
 
< 0.1%
41721
 
< 0.1%
41711
 
< 0.1%
41701
 
< 0.1%
Other values (6245)6245
99.8%
ValueCountFrequency (%)
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
101
< 0.1%
ValueCountFrequency (%)
62551
< 0.1%
62541
< 0.1%
62531
< 0.1%
62521
< 0.1%
62511
< 0.1%
62501
< 0.1%
62491
< 0.1%
62481
< 0.1%
62471
< 0.1%
62461
< 0.1%

Store
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct45
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23
Minimum1
Maximum45
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size49.0 KiB

Quantile statistics

Minimum1
5-th percentile3
Q112
median23
Q334
95-th percentile43
Maximum45
Range44
Interquartile range (IQR)22

Descriptive statistics

Standard deviation12.98821143
Coefficient of variation (CV)0.5647048447
Kurtosis-1.201186658
Mean23
Median Absolute Deviation (MAD)11
Skewness0
Sum143865
Variance168.6936361
MonotonicityIncreasing
Histogram with fixed size bins (bins=45)
ValueCountFrequency (%)
1139
 
2.2%
24139
 
2.2%
26139
 
2.2%
27139
 
2.2%
28139
 
2.2%
29139
 
2.2%
30139
 
2.2%
31139
 
2.2%
32139
 
2.2%
33139
 
2.2%
Other values (35)4865
77.8%
ValueCountFrequency (%)
1139
2.2%
2139
2.2%
3139
2.2%
4139
2.2%
5139
2.2%
6139
2.2%
7139
2.2%
8139
2.2%
9139
2.2%
10139
2.2%
ValueCountFrequency (%)
45139
2.2%
44139
2.2%
43139
2.2%
42139
2.2%
41139
2.2%
40139
2.2%
39139
2.2%
38139
2.2%
37139
2.2%
36139
2.2%

Date
Categorical

HIGH CARDINALITY
UNIFORM

Distinct139
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Memory size49.0 KiB
05/02/2010
 
45
02/12/2011
 
45
21/10/2011
 
45
28/10/2011
 
45
04/11/2011
 
45
Other values (134)
6030 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row05/02/2010
2nd row12/02/2010
3rd row19/02/2010
4th row26/02/2010
5th row05/03/2010

Common Values

ValueCountFrequency (%)
05/02/201045
 
0.7%
02/12/201145
 
0.7%
21/10/201145
 
0.7%
28/10/201145
 
0.7%
04/11/201145
 
0.7%
11/11/201145
 
0.7%
18/11/201145
 
0.7%
25/11/201145
 
0.7%
09/12/201145
 
0.7%
03/02/201245
 
0.7%
Other values (129)5805
92.8%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
05/02/201045
 
0.7%
09/04/201045
 
0.7%
21/05/201045
 
0.7%
14/05/201045
 
0.7%
07/05/201045
 
0.7%
30/04/201045
 
0.7%
23/04/201045
 
0.7%
16/04/201045
 
0.7%
02/04/201045
 
0.7%
08/10/201045
 
0.7%
Other values (129)5805
92.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Temperature
Real number (ℝ)

Distinct3470
Distinct (%)55.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean60.63919904
Minimum-2.06
Maximum100.14
Zeros0
Zeros (%)0.0%
Negative1
Negative (%)< 0.1%
Memory size49.0 KiB

Quantile statistics

Minimum-2.06
5-th percentile27.525
Q147.17
median62.72
Q375.22
95-th percentile87.765
Maximum100.14
Range102.2
Interquartile range (IQR)28.05

Descriptive statistics

Standard deviation18.62409426
Coefficient of variation (CV)0.3071296217
Kurtosis-0.6450981937
Mean60.63919904
Median Absolute Deviation (MAD)13.95
Skewness-0.3330630907
Sum379298.19
Variance346.8568872
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
50.4311
 
0.2%
67.8710
 
0.2%
72.629
 
0.1%
76.679
 
0.1%
64.058
 
0.1%
64.218
 
0.1%
70.877
 
0.1%
76.037
 
0.1%
50.567
 
0.1%
44.427
 
0.1%
Other values (3460)6172
98.7%
ValueCountFrequency (%)
-2.061
< 0.1%
5.541
< 0.1%
6.231
< 0.1%
7.461
< 0.1%
9.511
< 0.1%
9.551
< 0.1%
10.091
< 0.1%
10.111
< 0.1%
10.241
< 0.1%
10.531
< 0.1%
ValueCountFrequency (%)
100.141
 
< 0.1%
100.071
 
< 0.1%
99.661
 
< 0.1%
99.223
< 0.1%
99.21
 
< 0.1%
98.431
 
< 0.1%
98.151
 
< 0.1%
97.661
 
< 0.1%
97.61
 
< 0.1%
97.183
< 0.1%

Fuel_Price
Real number (ℝ≥0)

HIGH CORRELATION

Distinct877
Distinct (%)14.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.344369305
Minimum2.472
Maximum4.308
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size49.0 KiB

Quantile statistics

Minimum2.472
5-th percentile2.642
Q12.917
median3.413
Q33.722
95-th percentile4.0216
Maximum4.308
Range1.836
Interquartile range (IQR)0.805

Descriptive statistics

Standard deviation0.4553641003
Coefficient of variation (CV)0.1361584379
Kurtosis-1.219720747
Mean3.344369305
Median Absolute Deviation (MAD)0.39
Skewness-0.08215093034
Sum20919.03
Variance0.2073564638
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3.63839
 
0.6%
3.6334
 
0.5%
3.89129
 
0.5%
2.77129
 
0.5%
3.52428
 
0.4%
2.7228
 
0.4%
3.66627
 
0.4%
3.52327
 
0.4%
3.84225
 
0.4%
3.12925
 
0.4%
Other values (867)5964
95.3%
ValueCountFrequency (%)
2.4721
 
< 0.1%
2.5131
 
< 0.1%
2.51414
0.2%
2.521
 
< 0.1%
2.5331
 
< 0.1%
2.5391
 
< 0.1%
2.542
 
< 0.1%
2.5421
 
< 0.1%
2.5451
 
< 0.1%
2.54814
0.2%
ValueCountFrequency (%)
4.3083
< 0.1%
4.2946
0.1%
4.2933
< 0.1%
4.2883
< 0.1%
4.2823
< 0.1%
4.2776
0.1%
4.2736
0.1%
4.2546
0.1%
4.253
< 0.1%
4.2223
< 0.1%

Promotion1
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct2099
Distinct (%)99.9%
Missing4153
Missing (%)66.4%
Infinite0
Infinite (%)0.0%
Mean7155.930661
Minimum0.27
Maximum88646.76
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size49.0 KiB

Quantile statistics

Minimum0.27
5-th percentile98.194
Q11844.295
median5221.14
Q39199.2425
95-th percentile21966.923
Maximum88646.76
Range88646.49
Interquartile range (IQR)7354.9475

Descriptive statistics

Standard deviation8408.206085
Coefficient of variation (CV)1.174998261
Kurtosis17.05655515
Mean7155.930661
Median Absolute Deviation (MAD)3658.545
Skewness3.281296742
Sum15041766.25
Variance70697929.56
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
175.642
 
< 0.1%
460.732
 
< 0.1%
1.52
 
< 0.1%
292.531
 
< 0.1%
799.321
 
< 0.1%
200.861
 
< 0.1%
164.491
 
< 0.1%
92.131
 
< 0.1%
283.011
 
< 0.1%
971.971
 
< 0.1%
Other values (2089)2089
33.4%
(Missing)4153
66.4%
ValueCountFrequency (%)
0.271
< 0.1%
0.51
< 0.1%
1.52
< 0.1%
1.941
< 0.1%
2.121
< 0.1%
2.41
< 0.1%
2.421
< 0.1%
2.431
< 0.1%
2.81
< 0.1%
2.911
< 0.1%
ValueCountFrequency (%)
88646.761
< 0.1%
78124.51
< 0.1%
75149.791
< 0.1%
65021.231
< 0.1%
62567.61
< 0.1%
62172.731
< 0.1%
60740.641
< 0.1%
60394.731
< 0.1%
58928.521
< 0.1%
56917.71
< 0.1%

Promotion2
Real number (ℝ)

MISSING

Distinct1457
Distinct (%)91.5%
Missing4663
Missing (%)74.5%
Infinite0
Infinite (%)0.0%
Mean3308.12581
Minimum-265.76
Maximum104519.54
Zeros3
Zeros (%)< 0.1%
Negative18
Negative (%)0.3%
Memory size49.0 KiB

Quantile statistics

Minimum-265.76
5-th percentile1.91
Q139.755
median205.41
Q31931.005
95-th percentile15721.981
Maximum104519.54
Range104785.3
Interquartile range (IQR)1891.25

Descriptive statistics

Standard deviation9382.823804
Coefficient of variation (CV)2.83629594
Kurtosis38.08772863
Mean3308.12581
Median Absolute Deviation (MAD)198.42
Skewness5.466413849
Sum5266536.29
Variance88037382.53
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.59
 
0.1%
0.59
 
0.1%
1.918
 
0.1%
37
 
0.1%
66
 
0.1%
46
 
0.1%
195
 
0.1%
3.825
 
0.1%
7.645
 
0.1%
5.735
 
0.1%
Other values (1447)1527
 
24.4%
(Missing)4663
74.5%
ValueCountFrequency (%)
-265.761
< 0.1%
-1921
< 0.1%
-201
< 0.1%
-10.981
< 0.1%
-10.52
< 0.1%
-9.981
< 0.1%
-9.941
< 0.1%
-7.61
< 0.1%
-6.691
< 0.1%
-5.981
< 0.1%
ValueCountFrequency (%)
104519.541
< 0.1%
97740.991
< 0.1%
92523.941
< 0.1%
89121.941
< 0.1%
82881.161
< 0.1%
72413.711
< 0.1%
70574.851
< 0.1%
58804.911
< 0.1%
58046.411
< 0.1%
56106.21
< 0.1%

Promotion3
Real number (ℝ)

MISSING

Distinct1562
Distinct (%)82.9%
Missing4370
Missing (%)69.9%
Infinite0
Infinite (%)0.0%
Mean1462.535523
Minimum-29.1
Maximum141630.61
Zeros1
Zeros (%)< 0.1%
Negative4
Negative (%)0.1%
Memory size49.0 KiB

Quantile statistics

Minimum-29.1
5-th percentile0.6
Q14.7
median24.6
Q3104.01
95-th percentile1073.828
Maximum141630.61
Range141659.71
Interquartile range (IQR)99.31

Descriptive statistics

Standard deviation9667.580258
Coefficient of variation (CV)6.610150734
Kurtosis76.23384752
Mean1462.535523
Median Absolute Deviation (MAD)22.8
Skewness8.313267116
Sum2756879.46
Variance93462108.05
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
310
 
0.2%
29
 
0.1%
19
 
0.1%
0.227
 
0.1%
0.56
 
0.1%
0.66
 
0.1%
0.016
 
0.1%
66
 
0.1%
1.986
 
0.1%
3.26
 
0.1%
Other values (1552)1814
29.0%
(Missing)4370
69.9%
ValueCountFrequency (%)
-29.11
 
< 0.1%
-11
 
< 0.1%
-0.871
 
< 0.1%
-0.21
 
< 0.1%
01
 
< 0.1%
0.016
0.1%
0.021
 
< 0.1%
0.044
0.1%
0.051
 
< 0.1%
0.063
< 0.1%
ValueCountFrequency (%)
141630.611
< 0.1%
109030.751
< 0.1%
103991.941
< 0.1%
101378.791
< 0.1%
89402.641
< 0.1%
88805.581
< 0.1%
83340.331
< 0.1%
83192.811
< 0.1%
79621.21
< 0.1%
77451.261
< 0.1%

Promotion4
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct1798
Distinct (%)98.8%
Missing4436
Missing (%)70.9%
Infinite0
Infinite (%)0.0%
Mean3465.952501
Minimum0.41
Maximum67474.85
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size49.0 KiB

Quantile statistics

Minimum0.41
5-th percentile23.752
Q1499.895
median1532.63
Q33640.905
95-th percentile13114.42
Maximum67474.85
Range67474.44
Interquartile range (IQR)3141.01

Descriptive statistics

Standard deviation6413.116294
Coefficient of variation (CV)1.850318575
Kurtosis28.70315883
Mean3465.952501
Median Absolute Deviation (MAD)1231
Skewness4.744796878
Sum6304567.6
Variance41128060.6
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
94
 
0.1%
43
 
< 0.1%
23
 
< 0.1%
172
 
< 0.1%
122
 
< 0.1%
2.52
 
< 0.1%
657.562
 
< 0.1%
67.722
 
< 0.1%
52
 
< 0.1%
22.52
 
< 0.1%
Other values (1788)1795
28.7%
(Missing)4436
70.9%
ValueCountFrequency (%)
0.411
 
< 0.1%
0.461
 
< 0.1%
0.781
 
< 0.1%
0.871
 
< 0.1%
0.921
 
< 0.1%
1.51
 
< 0.1%
1.881
 
< 0.1%
1.981
 
< 0.1%
23
< 0.1%
2.281
 
< 0.1%
ValueCountFrequency (%)
67474.851
< 0.1%
57817.561
< 0.1%
57815.431
< 0.1%
53603.991
< 0.1%
52739.021
< 0.1%
48403.531
< 0.1%
48159.861
< 0.1%
48086.641
< 0.1%
47452.431
< 0.1%
46238.281
< 0.1%

Promotion5
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct2114
Distinct (%)> 99.9%
Missing4140
Missing (%)66.2%
Infinite0
Infinite (%)0.0%
Mean4518.993173
Minimum135.16
Maximum108519.28
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size49.0 KiB

Quantile statistics

Minimum135.16
5-th percentile671.815
Q11742.305
median3226.41
Q35444.03
95-th percentile10978.358
Maximum108519.28
Range108384.12
Interquartile range (IQR)3701.725

Descriptive statistics

Standard deviation6048.661908
Coefficient of variation (CV)1.338497687
Kurtosis106.8102016
Mean4518.993173
Median Absolute Deviation (MAD)1656.68
Skewness8.179740726
Sum9557670.56
Variance36586310.87
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2743.182
 
< 0.1%
1987.381
 
< 0.1%
3942.031
 
< 0.1%
6256.021
 
< 0.1%
3413.91
 
< 0.1%
6207.391
 
< 0.1%
5199.21
 
< 0.1%
11604.371
 
< 0.1%
24475.381
 
< 0.1%
492.361
 
< 0.1%
Other values (2104)2104
33.6%
(Missing)4140
66.2%
ValueCountFrequency (%)
135.161
< 0.1%
153.041
< 0.1%
153.91
< 0.1%
164.081
< 0.1%
170.641
< 0.1%
171.761
< 0.1%
212.751
< 0.1%
224.861
< 0.1%
227.121
< 0.1%
239.991
< 0.1%
ValueCountFrequency (%)
108519.281
< 0.1%
105223.111
< 0.1%
85851.871
< 0.1%
63005.581
< 0.1%
58068.141
< 0.1%
57029.781
< 0.1%
53212.721
< 0.1%
37581.271
< 0.1%
36430.331
< 0.1%
36360.421
< 0.1%

Unemployment
Real number (ℝ≥0)

HIGH CORRELATION

Distinct321
Distinct (%)5.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.029235651
Minimum4.077
Maximum14.313
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size49.0 KiB

Quantile statistics

Minimum4.077
5-th percentile5.401
Q16.9165
median7.906
Q38.622
95-th percentile12.187
Maximum14.313
Range10.236
Interquartile range (IQR)1.7055

Descriptive statistics

Standard deviation1.874874599
Coefficient of variation (CV)0.2335059874
Kurtosis2.666559273
Mean8.029235651
Median Absolute Deviation (MAD)0.838
Skewness1.211816448
Sum50222.869
Variance3.515154762
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.09978
 
1.2%
8.16356
 
0.9%
7.85256
 
0.9%
6.56552
 
0.8%
6.89152
 
0.8%
7.05752
 
0.8%
7.44152
 
0.8%
7.93152
 
0.8%
8.252
 
0.8%
7.34350
 
0.8%
Other values (311)5703
91.2%
ValueCountFrequency (%)
4.07713
0.2%
4.12526
0.4%
4.15626
0.4%
4.26126
0.4%
4.30813
0.2%
4.4226
0.4%
4.58428
0.4%
4.60713
0.2%
4.78126
0.4%
5.11424
0.4%
ValueCountFrequency (%)
14.31342
0.7%
14.1839
0.6%
14.09939
0.6%
14.02136
0.6%
13.97524
0.4%
13.73639
0.6%
13.50342
0.7%
12.8939
0.6%
12.18739
0.6%
11.62739
0.6%

IsHoliday
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.2 KiB
False
5805 
True
 
450
ValueCountFrequency (%)
False5805
92.8%
True450
 
7.2%

Weekly_Sales
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
UNIQUE

Distinct6255
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1047619.074
Minimum209986.25
Maximum3818686.45
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size49.0 KiB

Quantile statistics

Minimum209986.25
5-th percentile308361.488
Q1553869.475
median960476.1
Q31421209.375
95-th percentile2050440.157
Maximum3818686.45
Range3608700.2
Interquartile range (IQR)867339.9

Descriptive statistics

Standard deviation565436.1856
Coefficient of variation (CV)0.5397345273
Kurtosis0.06879449217
Mean1047619.074
Median Absolute Deviation (MAD)425897.32
Skewness0.6733344437
Sum6552857307
Variance3.1971808 × 1011
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1643690.91
 
< 0.1%
438523.241
 
< 0.1%
1332261.011
 
< 0.1%
1366193.351
 
< 0.1%
1384870.511
 
< 0.1%
1344354.411
 
< 0.1%
1473386.751
 
< 0.1%
1543947.231
 
< 0.1%
1469252.051
 
< 0.1%
425410.041
 
< 0.1%
Other values (6245)6245
99.8%
ValueCountFrequency (%)
209986.251
< 0.1%
213538.321
< 0.1%
215359.211
< 0.1%
219804.851
< 0.1%
220060.351
< 0.1%
224031.191
< 0.1%
224294.391
< 0.1%
224639.761
< 0.1%
224806.961
< 0.1%
226702.361
< 0.1%
ValueCountFrequency (%)
3818686.451
< 0.1%
3766687.431
< 0.1%
3749057.691
< 0.1%
3676388.981
< 0.1%
3595903.21
< 0.1%
3556766.031
< 0.1%
3555371.031
< 0.1%
3526713.391
< 0.1%
3487986.891
< 0.1%
3436007.681
< 0.1%

Interactions

Correlations

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

idStoreDateTemperatureFuel_PricePromotion1Promotion2Promotion3Promotion4Promotion5UnemploymentIsHolidayWeekly_Sales
01105/02/201042.312.572NaNNaNNaNNaNNaN8.106False1643690.90
12112/02/201038.512.548NaNNaNNaNNaNNaN8.106True1641957.44
23119/02/201039.932.514NaNNaNNaNNaNNaN8.106False1611968.17
34126/02/201046.632.561NaNNaNNaNNaNNaN8.106False1409727.59
45105/03/201046.502.625NaNNaNNaNNaNNaN8.106False1554806.68
56112/03/201057.792.667NaNNaNNaNNaNNaN8.106False1439541.59
67119/03/201054.582.720NaNNaNNaNNaNNaN8.106False1472515.79
78126/03/201051.452.732NaNNaNNaNNaNNaN8.106False1404429.92
89102/04/201062.272.719NaNNaNNaNNaNNaN7.808False1594968.28
910109/04/201065.862.770NaNNaNNaNNaNNaN7.808False1545418.53

Last rows

idStoreDateTemperatureFuel_PricePromotion1Promotion2Promotion3Promotion4Promotion5UnemploymentIsHolidayWeekly_Sales
624562464527/07/201277.203.6475753.81167.951.239181.483156.068.684False711671.58
624662474503/08/201276.583.65424853.0539.5617.9611142.692768.328.684False725729.51
624762484510/08/201278.653.72217868.8450.6057.662593.931890.598.684False733037.32
624862494517/08/201275.713.8073657.796.000.301630.503794.228.684False722496.93
624962504524/08/201272.623.8347936.2058.3822.005518.072291.978.684False718232.26
625062514531/08/201275.093.86723641.306.0092.936988.313992.138.684False734297.87
625162524507/09/201275.703.91111024.4512.8052.631854.772055.708.684True766512.66
625262534514/09/201267.873.94811407.95NaN4.303421.725268.928.684False702238.27
625362544521/09/201265.324.0388452.2092.2863.242376.388670.408.684False723086.20
625462554528/09/201264.883.9974556.6120.641.501601.013288.258.684False713173.95